Simulation of distributions for the CLT/FLE experiment
Author
Stefano Dalla Bona
1. Fitting a Theoretical Distribution
To determine an appropriate Effect Size (ES) for detecting differences between two groups exposed to a null contingency illusory condition (one in a native language – NL, and one in a foreign language – FL), we first examine the expected distribution of the dependent variable for the NL group. This distribution is based on data from our previous study (Dalla Bona & Vicovaro, 2024). We then extend this analysis to consider the expected differences between the NL and FL groups.
Given that our current Contingency Learning Task (CLT) experiment will be conducted online using PsychoPy, and that we previously conducted an online experiment using it (with data available at: https://osf.io/c26qa/files/osfstorage), we use data from the prior study as a reference distribution. Specifically, we focus on the control group, which was not exposed to any manipulations of perceptual features, in the null contingency illusory condition. Our objective is to identify which generative distribution most likely underlies the observed empirical data points.
To identify potential candidate distributions, we employ a Cullen and Frey graph that plots kurtosis against the square of skewness (Cullen & Frey, 1999). This visualization helps us assess the distributional characteristics of the data and select the most appropriate generative model.
1.1 Importing Data and Describing It
We begin by loading the data from the Excel file and filtering it to include only participants with valid responses (valido == "si"). The tempoistr (time spent in the experiment) variable is then converted to numeric format, and we exclude any rows where the time spent is less than 10 seconds, as this was an exclusion criterion in the previous experiment. Next, we extract the causal evaluation scores for participants in the control group who were exposed to the null contingency scenario (identified by gruppo == 1). These scores are stored in the vector_illusion variable.
# Load required librarieslibrary(readxl)# Read the data from the Excel filedata <-as.data.frame(read_xlsx("Dataframeclt_online.xlsx"))# Display the structure of the datasetstr(data)
# Subset the first 209 rowsdata <- data[1:209,]# Filter data for valid responsesdata <-subset(data, data$valido =="si")# Convert 'tempoistr' to numeric and filter out fast responders (time < 10 seconds)data$tempoistr <-as.numeric(data$tempoistr)data <- data[-c(which(data$tempoistr <10)), ]# Extract causal evaluation scores for the control groupvector_illusion <- data$vd[data$gruppo ==1]
The causal evaluation is treated as a metric (interval-like) variable, with scores ranging from 0 to 100. For the control group (N = 60), the distribution of the dependent variable is fairly spread out, with a mean of 58.58, a median of 63, and a standard deviation of 18.21. The range spans from a minimum of 16 to a maximum of 90, and 50% of the data falls between 50 and 73 (1st Quartile = 49.5, 3rd Quartile = 73). The distribution exhibits a moderate negative skew (skewness = -0.63), which may be partially attributable to the data being bounded between 0 and 100. Additionally, the distribution is platykurtic (kurtosis = -0.45), indicating a flatter shape compared to a normal distribution.
To visualize this distribution, we use a combination of plots: the boxplot highlights the quartiles and potential outliers, the half-eye plot provides a kernel density estimate, and the dot plot emphasizes individual data points.
# Load required package for summary statisticslibrary(pastecs)# View the length and summary of the vectorlength(vector_illusion)
[1] 60
summary(vector_illusion)
Min. 1st Qu. Median Mean 3rd Qu. Max.
16.00 49.50 63.50 58.58 73.00 90.00
In this section, we aim to identify the distribution that best describes our sample data. To do so, we compare our data to various theoretical distributions using the Cullen and Frey graph. This graph plots kurtosis against the square of skewness (Cullen & Frey, 1999), allowing us to visually assess how well our data aligns with different distribution families.
The data are represented as a red point on the graph, indicating how closely the distribution resembles known distribution families. To enhance the analysis, we also include bootstrapped data (generated from the sample) to assess how the distribution holds up under resampling.
To generate the graph and perform the analysis, we use the fitdistrplus package. The following code fits the data to various distributions and plots the Cullen and Frey graph, treating the variable as discrete (assuming the random variable takes countable values).
# Load the required package for distribution fittinglibrary(fitdistrplus)
Loading required package: MASS
Loading required package: survival
# Generate the Cullen and Frey graph with bootstrapping (1000 resamples)descdist(vector_illusion, boot =1000, discrete =TRUE)
We consider two candidate distributions for the data: the Normal distribution and the Poisson distribution. To determine the most suitable model, we perform a series of diagnostic plots and statistical tests that compare the empirical distribution of the data against the theoretical distributions.
The following diagnostic plots are used:
Density comparison (denscomp): Compares the histogram of the empirical data to the fitted density functions of the theoretical distributions.
Q-Q plot (qqcomp): Plots the theoretical quantiles of the fitted distributions against the quantiles of the empirical data.
CDF comparison (cdfcomp): Compares the cumulative distribution function (CDF) of the empirical data to the CDFs of the fitted distributions.
P-P plot (ppcomp): Plots the theoretical cumulative probabilities against the empirical cumulative probabilities.
From these diagnostic plots, we observe that the Normal distribution fits the data better than the Poisson distribution. This observation is further confirmed by the goodness-of-fit tests, which indicate that the Normal distribution is the most probable model for our data. Specifically, the Akaike weight for the Normal distribution is approximately 1, suggesting that it is the best-fitting model.
It is important to note that the Normal distribution fitted to the data is effectively truncated, particularly on the right-hand side. This truncation occurs because the mean of the distribution exceeds the median point of 50, and the data are constrained within the boundaries of 0 to 100. As a result, values that would typically fall outside the upper boundary are clipped, leading to skewness, especially in the left tail. Based on these observations, we assume that the truncated Normal distribution is the most appropriate generative distribution for further analysis.
# Fit distributions to the datafnorm <-fitdist(vector_illusion, "norm") # Fitting Normal distributionfpois <-fitdist(vector_illusion, "pois") # Fitting Poisson distributionplot.legend <-c("Normal", "Poisson")par(mfrow =c(2, 2))# Plot the density comparison denscomp(list(fnorm, fpois), legendtext = plot.legend)# Plot the Q-Q plot comparison qqcomp(list(fnorm, fpois), legendtext = plot.legend)# Plot the CDF comparison cdfcomp(list(fnorm, fpois), legendtext = plot.legend)# Plot the P-P plot comparison ppcomp(list(fnorm, fpois), legendtext = plot.legend)
In this section, we employ a simulation-based approach to estimate a meaningful ES. Specifically, we aim to generate simulated data from a theoretical distribution (the truncated Normal distribution) and hypothesize about the potential meaningful difference between two groups, namely NL (i.e., the control group) and FL (i.e., the treatment group).
2.1 Simulating the Distribution
We have already established that the Normal distribution is the most likely model for our data. However, we must account for the fact that our data is discrete and constrained within the boundaries of 0 and 100. To address these constraints, we simulate the data using a truncated normal distribution, which respects both the boundaries and the characteristics of the data. The truncated normal distribution ensures that all generated values fall within the specified range of 0 to 100.
We will use a custom function, based on the qnorm and runif functions, to generate truncated normal data:
qnorm: The quantile function for the normal distribution, which transforms uniformly distributed random numbers into quantiles of the normal distribution.
runif: The function used to generate uniformly distributed random numbers, which are then transformed by qnorm into the desired normal distribution.
round: Since our data needs to be discrete, we round the simulated values to the nearest integer.
By imposing boundaries on the simulated data, we truncate the distribution. This means that any data points that would normally fall outside the specified range (0 to 100) are trimmed. Consequently, truncation introduces skewness in the distribution, particularly when the mean is positioned closer to one of the boundaries. This behavior is expected when dealing with bounded data.
When the mean of the distribution passed to the function is more extreme (i.e., closer to the boundaries), the simulation generates a distribution with a slightly more centered mean. This occurs because truncation removes values in the tails, effectively shrinking the range of the distribution. As a result, the standard deviation (SD) of the distribution is reduced as the mean moves toward the boundaries, because truncation limits the spread of the data.
# Custom function to generate data from a truncated normal distributiontnorm_f <-function(n, mean, sd, a =0, b =100) {qnorm(runif(n, pnorm(a, mean, sd), pnorm(b, mean, sd)), mean, sd) # Generate uniform random values and convert them #into quantiles of the normal distribution}# Simulate data using the truncated normal distributionset.seed(20242025)x <-round(tnorm_f(length(vector_illusion), 58.58, 18.21)) # Truncated at 0 and 100# Plot histogramggplot(mapping =aes(x, fill =factor(1))) +geom_histogram(binwidth =10, color ="black") +theme_bw() +scale_fill_okabe_ito(order =2) +theme(legend.position ="none") +labs(x ="Values", y ="Frequencies")
# Simulate data with different means to observe the effects of truncationa <-round(tnorm_f(10e3, 50, 20)) # Mean = 50b <-round(tnorm_f(10e3, 60, 20)) # Mean = 60c <-round(tnorm_f(10e3, 70, 20)) # Mean = 70d <-round(tnorm_f(10e3, 30, 20)) # Mean = 30# Plot histograms Mean = 50ggplot(mapping =aes(a, fill =factor(1))) +geom_histogram(binwidth =10, color ="black") +theme_bw() +scale_fill_okabe_ito(order =3) +theme(legend.position ="none") +labs(x ="Values", y ="Frequencies")
# Plot histograms Mean = 60ggplot(mapping =aes(b, fill =factor(1))) +geom_histogram(binwidth =10, color ="black") +theme_bw() +scale_fill_okabe_ito(order =4) +theme(legend.position ="none") +labs(x ="Values", y ="Frequencies")
# Plot histograms Mean = 70ggplot(mapping =aes(c, fill =factor(1))) +geom_histogram(binwidth =10, color ="black") +theme_bw() +scale_fill_okabe_ito(order =5) +theme(legend.position ="none") +labs(x ="Values", y ="Frequencies")
# Plot histograms Mean = 30ggplot(mapping =aes(d, fill =factor(1))) +geom_histogram(binwidth =10, color ="black") +theme_bw() +scale_fill_okabe_ito(order =6) +theme(legend.position ="none") +labs(x ="Values", y ="Frequencies")
# Simulate more data with different means to observe the effects of truncationx <-round(tnorm_f(10000, 50, 20)) # Mean = 50y <-rnorm(10000, 50, 20) # Untruncated normal data for comparison# Calculate means and SDs for truncated and untruncated datamean(x); sd(x)
[1] 49.7048
[1] 19.04769
mean(y); sd(y)
[1] 49.99585
[1] 20.04542
# Simulate with higher mean to observe the effect of truncation more clearlyx <-round(tnorm_f(10000, 65, 20)) # Mean = 65y <-rnorm(10000, 65, 20) # Untruncated normal data for comparison# Calculate means and SDs for the higher meanmean(x); sd(x)
[1] 63.1889
[1] 18.2378
mean(y); sd(y)
[1] 64.75427
[1] 19.90675
2.2 Comparisons
In this section, we compare how well the truncated normal distribution simulates the empirical data, relative to the normal distribution without boundaries. We visually examine the three distributions (empirical, normal, and truncated normal) and compute the overlap indices between the empirical data and the theoretical distributions. The overlap metric quantifies how much the distributions overlap, providing insight into how well each simulated distribution fits the empirical data.
From this analysis, we observe that the truncated normal distribution, with a mean of 60 and a standard deviation of 20 (closely approximating the empirical mean of 58.58 and standard deviation of 18.21), provides a marginally better fit for the data compared to the normal distribution without boundaries. Nonetheless, we continue to use the truncated normal distribution, as we anticipate that, by approaching more extreme values for the mean, this simulation approach will better capture the behavior of the data. Specifically, the truncated normal distribution allows us to more effectively model the data’s characteristics based on the positioning of the mean within the bounded range.
# Simulate data for normal and truncated normal distributionsset.seed(423)simulated_norm <-round(rnorm(1000000, mean(vector_illusion), sd(vector_illusion))) simulated_trunc <-round(tnorm_f(1000000, 60, 20)) # Plot the distributions using ggplotempirical_df <-data.frame(value = vector_illusion, type ="Empirical")simulated_df <-data.frame(value = simulated_norm, type ="Simulated Norm")simulated_df1 <-data.frame(value = simulated_trunc, type ="Simulated Trunc")combined_df <-rbind(empirical_df, simulated_df, simulated_df1)ggplot(combined_df, aes(x = value, col = type)) +geom_density(alpha =0.5) +scale_color_okabe_ito() +labs(title ="Empirical and Simulated Data Distributions",x ="Value",y ="Density") +theme_bw() +theme(legend.position ="top")
# Load the required packagelibrary(overlapping)
Loading required package: testthat
# Calculate overlap coefficient with normal distributionoverlap_norm_values <-rep(NA, 1000)for(i in1:1000){ simulated_sample <-round(rnorm(length(vector_illusion), 58.58, 18.41)) overlap_norm_values[i] <-overlap(list(simulated_sample, vector_illusion), type ="1")[1]}median(as.numeric(overlap_norm_values))
[1] 0.8612745
# Calculate overlap coefficient with truncated normal distributionoverlap_trunc_values <-rep(NA, 1000)for(i in1:1000){ simulated_sample <-round(tnorm_f(length(vector_illusion), 60, 20)) overlap_trunc_values[i] <-overlap(list(simulated_sample, vector_illusion), type ="1")[1]}median(as.numeric(overlap_trunc_values))
[1] 0.8797271
2.3 ES estimation
In this analysis, we are focused on estimating a meaningful ES within the context of a Likert scale ranging from 0 to 100. Here, 0 represents the lowest possible rating (i.e., “Definitely not”), and 100 represents the highest possible rating (i.e., “Definitely yes”). A 10-point change on this scale can be defined as a meaningful ES, corresponding to a 1/10 shift in the evaluation of causality. This shift could represent the degree to which participants’ evaluations of a causal relationship differ when presented in a FL versus their NL. This shift is plausible and reflects a moderate change in the associative strength between the cue and the outcome, providing a clear benchmark for evaluating the effect of the manipulation.
To simulate data for the NL group, we assume that a truncated normal distribution with a mean rating of 60 and a standard deviation (SD) of 20 is the generative distribution. This parametrization is based on the fit of data from our previous experiment, where the observed mean was 58.58 and the SD was 18.21. Using this distribution, the simulation produced a mean of 58.77 and a SD of 18.57, which closely mirrors the characteristics of the control group data.
For the FL group, we simulate a second truncated normal distribution with the mean shifted 10 points lower, representing the effect size. This results in a mean of 50, while the standard deviation remains the same at 20, ensuring that the spread of values in the FL group is similar to the NL group. The simulation for the FL group yielded a mean of 49.35 and a standard deviation of 19.07.
When parameterizing from a truncated normal distribution, we observe that the more extreme the initial parameters (such as a large shift in the mean), the more the mean is reduced, but the standard deviation also decreases. This behavior is expected, as truncation naturally compresses the distribution. However, in the next simulations, we will be cautious to ensure that the difference in means between the two groups remains at least 9.9, changing a little bit the initial parametrization, as the truncated distribution may sometimes produce a smaller difference than the target 10-point shift.
# Load librarieslibrary(effectsize)# Simulate data for NL (Mean = 60, SD = 20)set.seed(4234) a <-round(tnorm_f(10000, 60, 20)) # Simulate data for FL (Mean = 60 - 10, SD = 20)b <-round(tnorm_f(10000, mean(a) -10, 20)) # FL group # Calculate athe means and SDs for both groupsmean(a); sd(a) # NL group
[1] 58.7796
[1] 18.57298
mean(b); sd(b) # FL group
[1] 49.3576
[1] 19.07942
# Plot the densities data <-data.frame(value =c(a, b),group =factor(rep(c("NL", "FL"), each =length(a))))ggplot(data, aes(x = value, fill = group)) +geom_density(alpha =0.5) +labs(x ="Values", y ="Density") +scale_fill_okabe_ito(order=c(7,8)) +theme_bw() +theme(legend.title =element_blank())
# Calculate Cohen's D library(effectsize)cohens_d(a,b)
Cohen's d | 95% CI
------------------------
0.50 | [0.47, 0.53]
- Estimated using pooled SD.
We are uncertain about the exact mean for the NL group in the upcoming experiment, as some features are expected to change. However, we can assume that the NL group mean will vary within a range of -5 to +5 points compared to the previously observed data. Since the SD is likely to decrease when the mean shifts toward the extremes of the distribution and increase when the mean moves toward the center, we will use a fixed SD parameter value of 20 for all the simulated truncated normal distributions.
The mean of the FL group is set to be 10 points lower than the corresponding NL group mean, and we maintain the SD parameter at 20. The ES for each pair of NL and FL distributions is calculated using Cohen’s d. The following code simulates control and treatment group data for a range of NL group means, specifically from 55 to 65. For each NL mean, it calculates Cohen’s d as measure of the ES between the NL and FL groups. The simulation ensures that the mean difference between the two groups is always at least 10 points, adjusting the FL group mean parameter as necessary to meet this condition.
The output includes a visualization of how Cohen’s d varies across the range of NL group means, from 55 to 65. The histogram displays the distribution of the calculated effect sizes, and a line plot shows how the effect size changes as the NL group mean shifts. The results suggest that the hypothesized ES is moderate, as in all simulations, Cohen’s d did not drop below 0.5.
# Define the function to simulate data calculate_effect_size <-function(NLmean, FLdiff, n=10000, sd=20) { v <-rep(NA, 5) control_data <-round(tnorm_f(n, NLmean, sd)) # Simulate NL group treatment_data <-round(tnorm_f(n, NLmean - FLdiff, sd)) # Simulate FL group x <-0# Ensure the difference between means is at least 9.9if ((mean(control_data) -mean(treatment_data) <9.9)) {while (mean(control_data) -mean(treatment_data) <9.9) { x <- x +0.0005 treatment_data <-round(tnorm_f(n, NLmean - (FLdiff + x), sd))# Break when the mean difference is >= 9.9if (mean(control_data) -mean(treatment_data) >=9.9) {break } } } d <-as.numeric(cohens_d(control_data, treatment_data)[1]) v[1] <- d v[2] <-mean(control_data) v[3] <-sd(control_data) v[4] <-mean(treatment_data) v[5] <-sd(treatment_data)return(v)}# Define the range of NL group means from 55 to 65NLmeans <-seq(55, 65, by =0.1)# Store the resultsmean1 <-rep(NA, length(NLmeans))mean2 <-rep(NA, length(NLmeans))sd1 <-rep(NA, length(NLmeans))sd2 <-rep(NA, length(NLmeans))cohen <-rep(NA, length(NLmeans))# Loop through each possible control group mean for(i in1:length(NLmeans)) { vector <-calculate_effect_size(NLmeans[i], 10, n =100000) cohen[i] <- vector[1] mean1[i] <- vector[2] mean2[i] <- vector[4] sd1[i] <- vector[3] sd2[i] <- vector[5]}# Store the resultseffect_size_df <-data.frame(NLmean = mean1,FLmean = mean2,NLsd = sd1,FLsd = sd2,d = cohen)round(effect_size_df$NLmean - effect_size_df$FLmean,1)
# Distribution of Cohen's d valuesggplot(effect_size_df, aes(x = d)) +geom_histogram(fill="darkorange")+labs(y ="Frequencies", x ="Cohen's d") +theme_bw()
`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
# Cohen's d for various control group meansggplot(effect_size_df, aes(x = mean1, y = d)) +geom_line() +geom_point() +labs(x ="NL Group Mean", y ="Cohen's d") +theme_bw()
The goal of this final simulation is to hypothesize that the treatment group (FL) will show greater variability not only due to a reduction in the mean of the truncated normal distribution but also because individuals in the FL group might show differing levels of response to the treatment. To account for this increased variability, we will simulate different treatment group standard deviations (SDs), ranging from 20 to 25, with a step size of 0.5.
Finally, we store the results in a dataframe that includes Cohen’s d, which will constitute our prior knowledge on the expected ES, which will be used for future analyses. We also store the updated mean parameterization of the FL group. In each simulation, the FL group’s mean parameter has been adjusted to ensure that the difference between the control and treatment group means is always at least 9.9. This updated mean will be useful for design analysis, as it ensures the simulation maintains the required mean difference and allows for an accurate representation of the distribution in next simulations.
# Creation of the functioncalculate_effect_size_with_varying_treatment_sd <-function(NLmean, NLsd, FLdiff, treatment_sd, n =10000) {# Simulate NL group data with fixed SD = 20 control_data <-round(tnorm_f(n, NLmean, NLsd)) ## Simulate treatment group data with fixed mean difference and varying SD treatment_data <-round(tnorm_f(n, NLmean - FLdiff, treatment_sd))# Ensuring that the difference between means is at least 9.9 x <-0while (mean(control_data) -mean(treatment_data) <9.9) { x <- x +0.0005 treatment_data <-round(tnorm_f(n, NLmean - (FLdiff + x), treatment_sd)) }# Cohen's d d <-as.numeric(cohens_d(control_data, treatment_data, pooled_sd =TRUE)[1])# Return the results and new parameters to storereturn(c(d, NLmean - (FLdiff + x), mean(control_data), sd(control_data), NLmean, mean(treatment_data), sd(treatment_data), treatment_sd))}# Define the range of NL group means (55 to 65)control_means <-seq(55, 65, by =0.5)# Define the range of FL group standard deviations (20 to 25)treatment_sds <-seq(20, 25, by =0.5)# Fixed NL group SD control_sd <-20# Vectors to store resultsmean1_var <-rep(NA, length(control_means) *length(treatment_sds))mean1_true <-rep(NA, length(control_means) *length(treatment_sds))mean2_var <-rep(NA, length(control_means) *length(treatment_sds))mean2_true <-rep(NA, length(control_means) *length(treatment_sds))sd1_var <-rep(NA, length(control_means) *length(treatment_sds))sd2_var <-rep(NA, length(control_means) *length(treatment_sds))cohen_var <-rep(NA, length(control_means) *length(treatment_sds))sd_true <-rep(NA, length(control_means) *length(treatment_sds))counter <-1# For each combination of control mean and treatment SDfor (mean_val in control_means) {for (treatment_sd_val in treatment_sds) { vector <-calculate_effect_size_with_varying_treatment_sd(mean_val, control_sd, 10, treatment_sd_val, n =100000) cohen_var[counter] <- vector[1] mean1_true[counter] <- vector[2] mean1_var[counter] <- vector[3] mean2_true[counter] <- vector[5] mean2_var[counter] <- vector[6] sd1_var[counter] <- vector[4] sd2_var[counter] <- vector[7] sd_true[counter] <- vector[8] counter <- counter +1 }}effect_size_treatment_sd_df <-data.frame(NLmean = mean1_var,NLmeantrue = mean1_true,FLmean = mean2_var,FLmeantrue = mean2_true,NLsd = sd1_var,FLsd = sd2_var,d = cohen_var,sd = sd_true)# Cohen's d distributionggplot(effect_size_treatment_sd_df, aes(x = d)) +geom_histogram(fill="darkorange")+labs(y ="Frequencies", x ="Cohen's d") +theme_bw()
`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
# Cohen's d for different NL means and FL SD valuesggplot(effect_size_treatment_sd_df, aes(x = FLmeantrue, y = d, color =factor(sd))) +geom_line() +geom_point() +labs(x ="NL Group Mean", y ="Cohen's d") +scale_color_viridis_d() +scale_fill_viridis_c() +theme_bw() +geom_smooth(method ="lm", alpha =0.2) +theme(legend.position ="bottom")
Barrett M (2024). ggokabeito: ‘Okabe-Ito’ Scales for ‘ggplot2’ and ‘ggraph’. R package version 0.1.0.9000, commit e28e8b7a0a3301ac40722fb07ed082bde424bb8f, https://github.com/malcolmbarrett/ggokabeito.
Ben-Shachar M, Lüdecke D, Makowski D (2020). effectsize: Estimation of Effect Size Indices and Standardized Parameters. Journal of Open Source Software, 5(56), 2815. doi: 10.21105/joss.02815
Cullen, A. C., & Frey, H. C. (1999). Probabilistic techniques in exposure assessment: a handbook for dealing with variability and uncertainty in models and inputs. Springer Science & Business.
Dalla Bona, S., & Vicovaro, M. (2024). Does perceptual disfluency affect the illusion of causality?. Quarterly Journal of Experimental Psychology, 77(8), 1727-1744. https://doi.org/10.1177/17470218231220928
Delignette-Muller, M.L., Dutang, C. (2015). fitdistrplus: An R Package for Fitting Distributions. Journal of Statistical Software, 64(4), 1-34. DOI 10.18637/jss.v064.i04.
Kay, M. (2024). “ggdist: Visualizations of Distributions and Uncertainty in the Grammar of Graphics.” IEEE Transactions on Visualization and Computer Graphics, 30(1), 414-424. https://doi.org/10.1109/TVCG.2023.3327195.
Pastore M, Loro PAD, Mingione M, Calcagni’ A (2022). overlapping: Estimation of Overlapping in Empirical Distributions. R package version 2.1, https://CRAN.R-project.org/package=overlapping.
Wickham, H. (2016) ggplot2: Elegant Graphics for Data Analysis. Springer-Verlag New York.